37 research outputs found

    Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go

    Get PDF
    International audienceThis paper presents a successful application of parallel (grid) coevolution applied to the building of an opening book (OB) in 9x9 Go. Known sayings around the game of Go are refound by the algorithm, and the resulting program was also able to credibly comment openings in professional games of 9x9 Go. Interestingly, beyond the application to the game of Go, our algorithm can be seen as a ”meta”-level for the UCT-algorithm: ”UCT applied to UCT” (instead of ”UCT applied to a random player” as usual), in order to build an OB. It is generic and could be applied as well for analyzing a given situation of a Markov Decision Process

    Adding expert knowledge and exploration in Monte-Carlo Tree Search

    Get PDF
    International audienceWe present a new exploration term, more efficient than clas- sical UCT-like exploration terms and combining efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values and classi- cal online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo: { We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification. { We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19. { Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo. MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien

    Combiner connaissances expertes, hors-ligne, transientes et en ligne pour l'exploration Monte-Carlo

    Get PDF
    National audienceNous combinons pour de l'exploration Monte-Carlo d'arbres de l'apprentissage arti- RÉSUMÉ. ïŹciel Ă  4 Ă©chelles de temps : – regret en ligne, via l'utilisation d'algorithmes de bandit et d'estimateurs Monte-Carlo ; – de l'apprentissage transient, via l'utilisation d'estimateur rapide de Q-fonction (RAVE, pour Rapid Action Value Estimate) qui sont appris en ligne et utilisĂ©s pour accĂ©lĂ©rer l'explora- tion mais sont ensuite peu Ă  peu laissĂ©s de cĂŽtĂ© Ă  mesure que des informations plus ïŹnes sont disponibles ; – apprentissage hors-ligne, par fouille de donnĂ©es de jeux ; – utilisation de connaissances expertes comme information a priori. L'algorithme obtenu est plus fort que chaque Ă©lĂ©ment sĂ©parĂ©ment. Nous mettons en Ă©vidence par ailleurs un dilemne exploration-exploitation dans l'exploration Monte-Carlo d'arbres et obtenons une trĂšs forte amĂ©lioration par calage des paramĂštres correspondant. We combine for Monte-Carlo exploration machine learning at four different time ABSTRACT. scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – ofïŹ‚ine learning, by data mining of datasets of games; – use of expert knowledge coming from the old ages as prior information

    Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go

    Get PDF
    International audienceThis paper presents a successful application of parallel (grid) coevolution applied to the building of an opening book (OB) in 9x9 Go. Known sayings around the game of Go are refound by the algorithm, and the resulting program was also able to credibly comment openings in professional games of 9x9 Go. Interestingly, beyond the application to the game of Go, our algorithm can be seen as a ”meta”-level for the UCT-algorithm: ”UCT applied to UCT” (instead of ”UCT applied to a random player” as usual), in order to build an OB. It is generic and could be applied as well for analyzing a given situation of a Markov Decision Process

    The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments

    Get PDF
    International audienceTHE AUTHORS ARE EXTREMELY GRATEFUL TO GRID5000 for helping in designing and experimenting around Monte-Carlo Tree Search. In order to promote computer Go and stimulate further development and research in the field, the event activities, "Computational Intelligence Forum" and "World 99 Computer Go Championship," were held in Taiwan. This study focuses on the invited games played in the tournament, "Taiwanese Go players versus the computer program MoGo," held at National University of Tainan (NUTN). Several Taiwanese Go players, including one 9-Dan professional Go player and eight amateur Go players, were invited by NUTN to play against MoGo from August 26 to October 4, 2008. The MoGo program combines All Moves As First (AMAF)/Rapid Action Value Estimation (RAVE) values, online "UCT-like" values, offline values extracted from databases, and expert rules. Additionally, four properties of MoGo are analyzed including: (1) the weakness in corners, (2) the scaling over time, (3) the behavior in handicap games, and (4) the main strength of MoGo in contact fights. The results reveal that MoGo can reach the level of 3 Dan with, (1) good skills for fights, (2) weaknesses in corners, in particular for "semeai" situations, and (3) weaknesses in favorable situations such as handicap games. It is hoped that the advances in artificial intelligence and computational power will enable considerable progress in the field of computer Go, with the aim of achieving the same levels as computer chess or Chinese chess in the future

    Combiner connaissances expertes, hors-ligne, transientes et en ligne pour l'exploration Monte-Carlo

    Get PDF
    National audienceNous combinons pour de l'exploration Monte-Carlo d'arbres de l'apprentissage arti- RÉSUMÉ. ïŹciel Ă  4 Ă©chelles de temps : – regret en ligne, via l'utilisation d'algorithmes de bandit et d'estimateurs Monte-Carlo ; – de l'apprentissage transient, via l'utilisation d'estimateur rapide de Q-fonction (RAVE, pour Rapid Action Value Estimate) qui sont appris en ligne et utilisĂ©s pour accĂ©lĂ©rer l'explora- tion mais sont ensuite peu Ă  peu laissĂ©s de cĂŽtĂ© Ă  mesure que des informations plus ïŹnes sont disponibles ; – apprentissage hors-ligne, par fouille de donnĂ©es de jeux ; – utilisation de connaissances expertes comme information a priori. L'algorithme obtenu est plus fort que chaque Ă©lĂ©ment sĂ©parĂ©ment. Nous mettons en Ă©vidence par ailleurs un dilemne exploration-exploitation dans l'exploration Monte-Carlo d'arbres et obtenons une trĂšs forte amĂ©lioration par calage des paramĂštres correspondant. We combine for Monte-Carlo exploration machine learning at four different time ABSTRACT. scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – ofïŹ‚ine learning, by data mining of datasets of games; – use of expert knowledge coming from the old ages as prior information

    Monte-Carlo Tree Search in Backgammon

    Full text link
    peer reviewedMonte-Carlo Tree Search is a new method which has been applied successfully to many games. However, it has never been tested on two-player perfect-information games with a chance factor. Backgam- mon is the reference game of this category. Today’s best Backgammon programs are based on reinforcement learning and are stronger than the best human players. These programs have played millions of offline games to learn to evaluate a position. Our approach consists rather in playing online simulated games to learn how to play correctly in the current position

    Monte-Carlo Tree Search

    Get PDF
    corecore